307 research outputs found
Probabilistic analysis of the human transcriptome with side information
Understanding functional organization of genetic information is a major
challenge in modern biology. Following the initial publication of the human
genome sequence in 2001, advances in high-throughput measurement technologies
and efficient sharing of research material through community databases have
opened up new views to the study of living organisms and the structure of life.
In this thesis, novel computational strategies have been developed to
investigate a key functional layer of genetic information, the human
transcriptome, which regulates the function of living cells through protein
synthesis. The key contributions of the thesis are general exploratory tools
for high-throughput data analysis that have provided new insights to
cell-biological networks, cancer mechanisms and other aspects of genome
function.
A central challenge in functional genomics is that high-dimensional genomic
observations are associated with high levels of complex and largely unknown
sources of variation. By combining statistical evidence across multiple
measurement sources and the wealth of background information in genomic data
repositories it has been possible to solve some the uncertainties associated
with individual observations and to identify functional mechanisms that could
not be detected based on individual measurement sources. Statistical learning
and probabilistic models provide a natural framework for such modeling tasks.
Open source implementations of the key methodological contributions have been
released to facilitate further adoption of the developed methods by the
research community.Comment: Doctoral thesis. 103 pages, 11 figure
A Quantitative Study of History in the English Short-Title Catalogue (ESTC), 1470-1800
This article analyses publication trends in the field of history in early modern Britain and North America in 1470–1800, based on English Short- Title Catalogue (ESTC) data. Its major contribution is to demonstrate the potential of digitized library catalogues as an essential scholastic tool and part of reproducible research. We also introduce a novel way of quantitatively analysing a particular trend in book production, namely the publishing of works in the field of history. The study is also our first experimental analysis of paper consumption in early modern book production, and dem- onstrates in practice the importance of open-science principles for library and information science. Three main research questions are addressed: 1) who wrote history; 2) where history was published; and 3) how publishing changed over time in early modern Britain and North America. In terms of our main findings we demonstrate that the average book size of history publications decreased over time, and that the octavo-sized book was the rising star in the eighteenth century, which is a true indication of expand- ing audiences. The article also compares different aspects of the most popu- lar writers on history, such as Edmund Burke and David Hume. Although focusing on history, these findings may reflect more widespread publishing trends in the early modern era. We show how some of the key questions in this field can be addressed through the quantitative analysis of large-scale bibliographic data collections.Peer reviewe
FdeSolver: A Julia Package for Solving Fractional Differential Equations
Implementing and executing numerical algorithms to solve fractional
differential equations has been less straightforward than using their
integer-order counterparts, posing challenges for practitioners who wish to
incorporate fractional calculus in applied case studies. Hence, we created an
open-source Julia package, FdeSolver, that provides numerical solutions for
fractional-order differential equations based on product-integration rules,
predictor-corrector algorithms, and the Newton-Raphson method. The package
covers solutions for one-dimensional equations with orders of positive real
numbers. For high-dimensional systems, the orders of positive real numbers are
limited to less than (and equal to) one. Incommensurate derivatives are allowed
and defined in the Caputo sense. Here, we summarize the implementation for a
representative class of problems, provide comparisons with available
alternatives in Julia and Matlab, describe our adherence to good practices in
open research software development, and demonstrate the practical performance
of the methods in two applications; we show how to simulate microbial community
dynamics and model the spread of Covid-19 by fitting the order of derivatives
based on epidemiological observations. Overall, these results highlight the
efficiency, reliability, and practicality of the FdeSolver Julia package
Ebola epidemic model with dynamic population and memory
The recent outbreaks of Ebola encourage researchers to develop mathematical models for simulating the
dynamics of Ebola transmission. We continue the study of the models focusing on those with a variable
population. Hence, this paper presents a compartmental model consisting of 8-dimensional nonlinear dif-
ferential equations with a dynamic population and investigates its basic reproduction number. Moreover, a
dimensionless model is introduced for numerical analysis, thus proving the disease-free equilibrium is locally
asymptotically stable whenever the threshold condition, known as a basic reproduction number, is less than
one. Finally, we use a fractional differential form of the model to sufficiently fit long time-series data of Guinea,
Liberia, and Sierra Leone retrieved from the World Health Organization, and the numerical results demonstrate
the performance of the model.publishe
Dependency detection with similarity constraints
Unsupervised two-view learning, or detection of dependencies between two
paired data sets, is typically done by some variant of canonical correlation
analysis (CCA). CCA searches for a linear projection for each view, such that
the correlations between the projections are maximized. The solution is
invariant to any linear transformation of either or both of the views; for
tasks with small sample size such flexibility implies overfitting, which is
even worse for more flexible nonparametric or kernel-based dependency discovery
methods. We develop variants which reduce the degrees of freedom by assuming
constraints on similarity of the projections in the two views. A particular
example is provided by a cancer gene discovery application where chromosomal
distance affects the dependencies between gene copy number and activity levels.
Similarity constraints are shown to improve detection performance of known
cancer genes.Comment: 9 pages, 3 figures. Appeared in proceedings of the 2009 IEEE
International Workshop on Machine Learning for Signal Processing XIX
(MLSP'09). Implementation of the method available at
http://bioconductor.org/packages/devel/bioc/html/pint.htm
- …